Goto

Collaborating Authors

 hierarchical structure



Hierarchical Contrastive Learning for Multimodal Data

arXiv.org Machine Learning

Multimodal representation learning is commonly built on a shared-private decomposition, treating latent information as either common to all modalities or specific to one. This binary view is often inadequate: many factors are shared by only subsets of modalities, and ignoring such partial sharing can over-align unrelated signals and obscure complementary information. We propose Hierarchical Contrastive Learning (HCL), a framework that learns globally shared, partially shared, and modality-specific representations within a unified model. HCL combines a hierarchical latent-variable formulation with structural sparsity and a structure-aware contrastive objective that aligns only modalities that genuinely share a latent factor. Under uncorrelated latent variables, we prove identifiability of the hierarchical decomposition, establish recovery guarantees for the loading matrices, and derive parameter estimation and excess-risk bounds for downstream prediction. Simulations show accurate recovery of hierarchical structure and effective selection of task-relevant components. On multimodal electronic health records, HCL yields more informative representations and consistently improves predictive performance.


Segment Anything without Supervision

Neural Information Processing Systems

The Segmentation Anything Model (SAM) requires labor-intensive data labeling. We present Unsupervised SAM (UnSAM) for promptable and automatic whole-image segmentation that does not require human annotations. UnSAM utilizes a divide-and-conquer strategy to "discover" the hierarchical structure of visual scenes.


EGODE: An Event-attended Graph ODE Framework for Modeling Rigid Dynamics

Neural Information Processing Systems

This paper studies the problem of rigid dynamics modeling, which has a wide range of applications in robotics, graphics, and mechanical design. The problem is partly solved by graph neural network (GNN) simulators. However, these approaches cannot effectively handle the relationship between intrinsic continuity and instantaneous changes in rigid dynamics.




Variational Temporal Abstraction

Neural Information Processing Systems

There have been approaches to learn such hierarchical structure in sequences such as the HMRNN (Chung et al., 2016). However, as a deterministic model, it has the main limitation that it cannot capture the stochastic nature prevailing in the data. In particular,this is acritical limitation to imagination-augmented agents because exploring various possible futures according to the uncertainty is what makes the imagination meaningful in many cases.




Novel positional encodings to enable tree-based transformers

Neural Information Processing Systems

Motivated by this property, we propose a method to extend transformers to tree-structured data, enabling sequence-totree, tree-to-sequence, and tree-to-tree mappings. Our approach abstracts the transformer'ssinusoidal positional encodings, allowing ustoinstead useanovel positional encoding scheme to represent node positions within trees.